week 6: integers and other monsters

ordered categories

Ordered categorical outcomes

Ordered categories are variables that are discrete, like a count, except that the values merely indicate different ordered levels.1 Most Likert scale questions are ordered categories, even though we pretend they’re continuous (see polychoric correlations).

If we want to model an outcome as an ordered categorical variable, what we’re really asking is how does moving along an associated predictor variable move predictions in the outcome progressively through the categories in sequence. We must therefore model our outcomes in the right order.

To do so, we’ll use the principles of the generalized linear model discussed last week. That is, we use a link function – the CUMULATIVE LINK FUNCTION – to model the probability of a value that is that value or smaller.

example: the trolley problem

data(Trolley, package = "rethinking")
trolley <- Trolley
head(trolley)
   case response order     id age male           edu action intention contact
1 cfaqu        4     2 96;434  14    0 Middle School      0         0       1
2 cfbur        3    31 96;434  14    0 Middle School      0         0       1
3 cfrub        4    16 96;434  14    0 Middle School      0         0       1
4 cibox        3    32 96;434  14    0 Middle School      0         1       1
5 cibur        3     4 96;434  14    0 Middle School      0         1       1
6 cispe        3     9 96;434  14    0 Middle School      0         1       1
  story action2
1   aqu       1
2   bur       1
3   rub       1
4   box       1
5   bur       1
6   spe       1
trolley %>% 
  ggplot(aes( x=response )) +
  geom_histogram()
Code
trolley %>%
  count(response) %>% 
  mutate(pr_k     = n/nrow(trolley),
         cum_pr_k = cumsum(pr_k)) %>% 
  ggplot(aes(x = response, y = cum_pr_k)) +
  geom_point(size = 3, color = "#1c5253") +
  geom_line(color = "#1c5253", alpha = .3) + 
  labs(x = "response",
       y = "cumulative probability")
Code
# primary data
trolley_plot <-
  trolley %>%
  count(response) %>%
  mutate(pr_k     = n / nrow(trolley),
         cum_pr_k = cumsum(n / nrow(trolley))) %>% 
  mutate(discrete_probability = ifelse(response == 1, cum_pr_k, cum_pr_k - pr_k))

# annotation
text <-
  tibble(text     = 1:7,
         response = seq(from = 1.25, to = 7.25, by = 1),
         cum_pr_k = trolley_plot$cum_pr_k - .065)

trolley_plot %>% 
  ggplot(aes(x = response, y = cum_pr_k,
             color = cum_pr_k, fill = cum_pr_k)) +
  geom_line() +
  geom_point(shape = 21, colour = "grey92", 
             size = 2.5, stroke = 1) +
  geom_linerange(aes(ymin = 0, ymax = cum_pr_k),
                 alpha = 1/2) +
  geom_linerange(aes(x = response + .025,
                     ymin = ifelse(response == 1, 0, discrete_probability), 
                     ymax = cum_pr_k),
                 color = "black") +
  # number annotation
  geom_text(data = text, 
            aes(label = text),
            size = 4) +
  scale_x_continuous(breaks = 1:7) +
  scale_y_continuous("cumulative proportion", breaks = c(0, .5, 1), 
                     limits = c(0, 1.1)) +
  theme(axis.ticks = element_blank(),
        axis.title.y = element_text(angle = 90),
        legend.position = "none")

On to the modeling. We’ll start with an intercept-only model.

\[\begin{align*} R_i &\sim \text{Categorical}(p) \\ \text{logit}(p_k) = \alpha_k - \phi \\ \phi &= 0 \\ \alpha_k &\sim \text{Normal}(0, 1.5) \end{align*}\]

Remember, \(\phi\) doesn’t have an intercept because there’s a \(\alpha\) for each cut point.

m1 <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1, 
  prior = c( prior(normal(0, 1.5), class = Intercept) ),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/models/62.1")
)
summary(m1)
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -1.92      0.03    -1.98    -1.86 1.00     2725     3116
Intercept[2]    -1.27      0.02    -1.32    -1.22 1.00     3960     3485
Intercept[3]    -0.72      0.02    -0.76    -0.68 1.00     4323     3760
Intercept[4]     0.25      0.02     0.21     0.29 1.00     4822     3486
Intercept[5]     0.89      0.02     0.85     0.93 1.00     4364     3378
Intercept[6]     1.77      0.03     1.71     1.83 1.00     4658     3543

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The thresholds are in the logit-metric. Let’s convert these back to probabilities.

gather_draws(m1, `b_.*`, regex=T) %>% 
  mutate(.value = inv_logit_scaled(.value)) %>% 
  mean_qi
# A tibble: 6 × 7
  .variable      .value .lower .upper .width .point .interval
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 b_Intercept[1]  0.128  0.122  0.135   0.95 mean   qi       
2 b_Intercept[2]  0.220  0.212  0.228   0.95 mean   qi       
3 b_Intercept[3]  0.328  0.319  0.337   0.95 mean   qi       
4 b_Intercept[4]  0.562  0.552  0.571   0.95 mean   qi       
5 b_Intercept[5]  0.709  0.700  0.718   0.95 mean   qi       
6 b_Intercept[6]  0.854  0.847  0.861   0.95 mean   qi       

exercise

Create a plot showing the posterior predictive distribution for the intercept-only model (m1). Can you visualize both the posterior and the observed data?

solution

Code
predicted_draws(m1, newdata = data.frame(1)) %>% 
  count(.prediction) %>%
  mutate(pr_k = n / sum(n)) %>% 
  ggplot(aes(x = .prediction, y = pr_k)) +
  geom_col(fill = "#1c5253", alpha = 0.7) +
  geom_point(data = trolley_plot, aes(x = response), 
             color = "#3d405b", size = 2) +
  labs(x = "response",
       y = "probability",
       title = "Posterior Predictive Distribution") 

adding predictors

Now let’s add the features of the story – contact, action, and intention – as predictors in our model. Let’s expand our mathematical model. Here’s our original:

\[\begin{align*} R_i &\sim \text{Categorical}(p) \\ \text{logit}(p_k) = \alpha_k - \phi_i \\ \phi_i &= 0 \\ \alpha_k &\sim \text{Normal}(0, 1.5) \end{align*}\]

McElreath proposes the following causal formula:

\[ \phi_i = \beta_1\text{action}_i + \beta_2\text{contact}_i + \beta_3\text{intention}_i \]

Code
m2 <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1 + action + contact + intention, 
  prior = c( prior(normal(0, 1.5), class = Intercept),
             prior(normal(0, 0.5), class = b)),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/models/62.2")
)

m2
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + action + contact + intention 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -2.83      0.05    -2.92    -2.74 1.00     2960     2654
Intercept[2]    -2.15      0.04    -2.23    -2.06 1.00     3324     2745
Intercept[3]    -1.56      0.04    -1.64    -1.49 1.00     2997     2965
Intercept[4]    -0.54      0.04    -0.61    -0.47 1.00     3180     2764
Intercept[5]     0.12      0.04     0.05     0.20 1.00     3373     3358
Intercept[6]     1.03      0.04     0.95     1.11 1.00     3586     3183
action          -0.70      0.04    -0.78    -0.62 1.00     3744     3404
contact         -0.95      0.05    -1.05    -0.85 1.00     3038     2764
intention       -0.72      0.04    -0.79    -0.64 1.00     4178     3090

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
gather_draws(m2, `b_.*`, regex=T) %>% 
  filter(!str_detect(.variable, "[0-9]")) %>% 
  mutate(.variable = str_remove(.variable, "b_")) %>% 
  ggplot( aes(x = .value, y = .variable) ) +
  geom_vline( aes(xintercept = 0), linetype = "dashed", alpha = .3 ) +
  stat_dist_halfeye(fill = "#5e8485") +
  scale_x_continuous("marginal posterior", breaks = -5:0 / 4) +
  scale_y_discrete(NULL) 
nd <- 
  trolley %>% 
  distinct(action, contact, intention) %>% 
  mutate(combination = str_c(action, contact, intention, sep = "_"))

f <- predicted_draws( m2 , newdata = nd )

f
# A tibble: 24,000 × 9
# Groups:   action, contact, intention, combination, .row [6]
   action contact intention combination  .row .chain .iteration .draw
    <int>   <int>     <int> <chr>       <int>  <int>      <int> <int>
 1      0       1         0 0_1_0           1     NA         NA     1
 2      0       1         0 0_1_0           1     NA         NA     2
 3      0       1         0 0_1_0           1     NA         NA     3
 4      0       1         0 0_1_0           1     NA         NA     4
 5      0       1         0 0_1_0           1     NA         NA     5
 6      0       1         0 0_1_0           1     NA         NA     6
 7      0       1         0 0_1_0           1     NA         NA     7
 8      0       1         0 0_1_0           1     NA         NA     8
 9      0       1         0 0_1_0           1     NA         NA     9
10      0       1         0 0_1_0           1     NA         NA    10
# ℹ 23,990 more rows
# ℹ 1 more variable: .prediction <fct>
Code
f %>% 
  as.data.frame() %>% 
  mutate(
    across( c(action, contact, intention) , 
            as.character),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, color=intention) )+
  geom_point() +
  geom_line(aes(group=intention)) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count" )

exercise

Create a new model that models the interaction between intention and the other variables. Then:

  1. Compare the fits of these models using PSIS or WAIC.
  2. Plot the posterior predictive distribution. How does it differ from our previous model?

solution

m2_complex <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1 + action + contact + intention + 
    action:intention + contact:intention, 
  prior = c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 0.5), class = b)),
  iter = 2000, warmup = 1000, cores = 4, chains = 4,
  file=here("files/models/62.2com")
)

solution: compare

m2  <-       add_criterion(m2,        "loo")
m2_complex <- add_criterion(m2_complex, "loo")

loo_compare(m2, m2_complex, criterion = "loo") %>% 
  print(simplify = F)
           elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
m2_complex      0.0       0.0 -18464.7     40.4        11.0      0.1  36929.3
m2            -80.3      12.5 -18545.0     37.9         9.0      0.0  37090.0
           se_looic
m2_complex     80.8
m2             75.8
m2  <-       add_criterion(m2,        "waic")
m2_complex <- add_criterion(m2_complex, "waic")

loo_compare(m2, m2_complex, criterion = "waic") %>% 
  print(simplify = F)
           elpd_diff se_diff  elpd_waic se_elpd_waic p_waic   se_p_waic
m2_complex      0.0       0.0 -18464.6      40.4         11.0      0.1 
m2            -80.3      12.5 -18545.0      37.9          9.0      0.0 
           waic     se_waic 
m2_complex  36929.3     80.8
m2          37090.0     75.8

solution: ppd

Code
predicted_draws(object = m2_complex, newdata = nd) %>% 
  mutate(
    intention=as.character(intention),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, color=intention) )+
  geom_point() +
  geom_line(aes(group=intention)) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count" )

solution: ppd

Code
p1 = predicted_draws(object = m2, newdata = nd) %>% 
  mutate(model = "simple")
p2 = predicted_draws(object = m2_complex, newdata = nd) %>% 
  mutate(model = "complex")

p1 %>% full_join(p2) %>% 
  mutate(
    intention=as.character(intention),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(model, action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, 
              color=interaction(intention,model,
                                sep="-",lex.order=TRUE)) )+
  geom_point() +
  geom_line(aes(group=interaction(intention,model,sep="-",lex.order=TRUE))) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count",
        color = "intention+model")

Ordered categorical predictors

distinct(trolley, edu)
                   edu
1        Middle School
2    Bachelor's Degree
3         Some College
4      Master's Degree
5 High School Graduate
6      Graduate Degree
7     Some High School
8    Elementary School
trolley <-
  trolley %>% 
  mutate(edu_new = 
           recode(edu,
                  "Elementary School" = 1,
                  "Middle School" = 2,
                  "Some High School" = 3,
                  "High School Graduate" = 4,
                  "Some College" = 5, 
                  "Bachelor's Degree" = 6,
                  "Master's Degree" = 7,
                  "Graduate Degree" = 8) %>% 
           as.integer())

To incorporate this variable as a MONOTONIC CATEOGORICAL PREDICTOR, we’ll set up a notation such that each step up in value comes with its own incremental or marginal effect on the outcome.

\[ \phi_i = \sum_{j=1}^7 \delta_j \] We only need 7 because the first category is absorbed into the intercept for \(\phi\). So our full formula for this parameter is: \[ \phi_i = \beta_1\text{action}_i + \beta_2\text{contact}_i + \beta_3\text{intention}_i + \beta_E\sum_{j=0}^{E_i-1} \delta_j \] where \(\beta_E\) is the average effect

\[\begin{align*} \text{response}_i &\sim \text{Categorical}(\mathbf{p}) \\ \text{logit}(p_k) &= \alpha_k - \phi_i \\ \phi_i &= \beta_1\text{action}_i + \beta_2\text{contact}_i + \beta_3\text{intention}_i + \beta_4 \text{ mo(edu_new}_i, \boldsymbol{\delta}) \\ \alpha_k &\sim \text{Normal}(0, 1.5) \\ \beta_1, \ldots, \beta_3 &\sim \text{Normal}(0, 1) \\ \beta_4 &\sim \text{Normal}(0, 0.143) \\ \boldsymbol{\delta} &\sim \text{Dirichlet}(2, 2, 2, 2, 2, 2, 2), \end{align*}\]

m3 <- 
  brm(data = trolley, 
      family = cumulative,
      response ~ 1 + action + contact + intention + mo(edu_new),  # note the `mo()` syntax
      prior = c(prior(normal(0, 1.5), class = Intercept),
                prior(normal(0, 1), class = b),
                # note the new kinds of prior statements
                prior(normal(0, 0.143), class = b, coef = moedu_new),
                prior(dirichlet(2, 2, 2, 2, 2, 2, 2), class = simo, coef = moedu_new1)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4, 
      seed = 12,
      file = here("files/models/62.3"))

Warning: this will probably take 20 minutes or so to run on your laptop. You can download my model file here.

m3
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + action + contact + intention + mo(edu_new) 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -3.13      0.17    -3.53    -2.84 1.00     1707     2012
Intercept[2]    -2.45      0.17    -2.83    -2.16 1.00     1702     1934
Intercept[3]    -1.87      0.17    -2.25    -1.58 1.00     1708     1804
Intercept[4]    -0.85      0.17    -1.23    -0.56 1.00     1731     1956
Intercept[5]    -0.18      0.17    -0.56     0.11 1.00     1718     1925
Intercept[6]     0.73      0.17     0.35     1.01 1.00     1737     1988
action          -0.71      0.04    -0.78    -0.63 1.00     4604     3182
contact         -0.96      0.05    -1.06    -0.86 1.00     4521     2761
intention       -0.72      0.04    -0.79    -0.65 1.00     4514     2992
moedu_new       -0.05      0.03    -0.11    -0.00 1.00     1705     1866

Monotonic Simplex Parameters:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moedu_new1[1]     0.26      0.15     0.04     0.59 1.00     2292     3104
moedu_new1[2]     0.14      0.09     0.02     0.35 1.00     4405     2501
moedu_new1[3]     0.19      0.11     0.03     0.44 1.00     4005     2613
moedu_new1[4]     0.16      0.09     0.03     0.39 1.00     3636     2564
moedu_new1[5]     0.04      0.05     0.00     0.15 1.00     2606     1894
moedu_new1[6]     0.09      0.06     0.01     0.25 1.00     3747     2484
moedu_new1[7]     0.12      0.07     0.02     0.29 1.00     3829     2596

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s see the effect of education after controlling for story parameters.

Code
nd <- distinct(
  trolley, 
  action, contact, intention, edu_new
)
predicted_draws(m3, nd) %>% 
  ungroup() %>% 
  count(edu_new, .prediction) %>% 
  with_groups(edu_new, mutate, total=sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction),
         edu_new = as.factor(edu_new)) %>% 
  ggplot(aes(x = .prediction,  y = pk, color = edu_new)) +
  geom_point() +
  geom_line() +
  labs( x="response", 
        y="probability" ) +
  theme(legend.position = "top")

exercise

Create a new model that includes education as a regular categorical predictor (not monotonic) and compare it to m3. Your tasks:

  1. Fit the model using the original edu variable (not edu_new)
  2. Create a plot of the posterior predictive distribution from the new model.

OPTIONAL 3. Use create a side-by-side visual comparison of the education effects 4. Interpret whether the monotonic constraint appears to improve the model fit

solution: fit model

Code
# Fit model with regular categorical education
m3_cat <- brm(
  data = trolley,
  family = cumulative,
  response ~ 1 + action + contact + intention + edu,
  prior = c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 1), class = b)),
  iter = 2000, warmup = 1000, cores = 4, chains = 4,
  file=here("files/models/62.3cat")
)

m3_cat
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + action + contact + intention + edu 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept[1]             -2.96      0.05    -3.06    -2.86 1.00     2945
Intercept[2]             -2.28      0.05    -2.37    -2.18 1.00     3064
Intercept[3]             -1.69      0.05    -1.78    -1.60 1.00     3071
Intercept[4]             -0.66      0.04    -0.75    -0.58 1.00     3178
Intercept[5]              0.01      0.04    -0.07     0.10 1.00     3191
Intercept[6]              0.92      0.05     0.83     1.01 1.00     3731
action                   -0.71      0.04    -0.79    -0.63 1.00     4290
contact                  -0.96      0.05    -1.07    -0.87 1.00     3938
intention                -0.72      0.04    -0.79    -0.65 1.00     4548
eduElementarySchool       0.91      0.20     0.51     1.30 1.00     4417
eduGraduateDegree        -0.17      0.06    -0.30    -0.06 1.00     3902
eduHighSchoolGraduate    -0.06      0.07    -0.19     0.07 1.00     4060
eduMastersDegree         -0.11      0.06    -0.22    -0.00 1.00     3674
eduMiddleSchool          -0.25      0.15    -0.54     0.04 1.00     4061
eduSomeCollege           -0.31      0.05    -0.40    -0.22 1.00     3476
eduSomeHighSchool         0.13      0.09    -0.05     0.31 1.00     4017
                      Tail_ESS
Intercept[1]              3032
Intercept[2]              3242
Intercept[3]              3462
Intercept[4]              3403
Intercept[5]              3188
Intercept[6]              3322
action                    3188
contact                   3308
intention                 3197
eduElementarySchool       2639
eduGraduateDegree         2979
eduHighSchoolGraduate     2920
eduMastersDegree          3003
eduMiddleSchool           2889
eduSomeCollege            3245
eduSomeHighSchool         3187

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

solution: ppd

Code
nd <- distinct(
  trolley, 
  action, contact, intention, edu
)
predicted_draws(m3_cat, nd) %>% 
  ungroup() %>% 
  count(edu, .prediction) %>% 
  mutate(
    edu = factor(edu,
                 levels = c(
                  "Elementary School",
                  "Middle School",
                  "Some High School",
                  "High School Graduate",
                  "Some College" ,
                  "Bachelor's Degree",
                  "Master's Degree",
                  "Graduate Degree"))) %>% 
  with_groups(edu, mutate, total=sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction),
         edu_new = as.factor(edu)) %>% 
  ggplot(aes(x = .prediction,  y = pk, color = edu)) +
  geom_point() +
  geom_line() +
  labs( x="response", 
        y="probability" ) +
  theme(legend.position = "top")

solution: side-by-side

Code
p1 <- predicted_draws(m3, newdata = distinct(trolley, action, contact, intention, edu_new)) %>%
  mutate(model = "ordered",
         edu = factor(edu_new, 
                      levels = c(1:8),
                      labels = c(
                  "Elementary School",
                  "Middle School",
                  "Some High School",
                  "High School Graduate",
                  "Some College" ,
                  "Bachelor's Degree",
                  "Master's Degree",
                  "Graduate Degree")))
p2 <- predicted_draws(m3_cat, newdata = distinct(trolley, action, contact, intention, edu)) %>%
  mutate(model = "categorical")

full_join(p1, p2) %>% 
  ungroup() %>% 
  count(model, edu, .prediction) %>% 
  with_groups(c(model,edu), mutate, total = sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction)) %>% 
  ggplot(aes(x = .prediction, y = pk, color = edu)) +
  geom_point(aes(shape = model)) +
  geom_line(aes(linetype = model)) +
  labs(x = "response", y = "probability") +
  guides(color = "none") + 
  facet_wrap(~edu) + 
  theme(legend.position = "top")

solution: compare fit

m3  <- add_criterion(m3,        "loo")
m3_cat <- add_criterion(m3_cat, "loo")

loo_compare(m3, m3_cat, criterion = "loo") %>% 
  print(simplify = F)
       elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
m3_cat      0.0       0.0 -18510.5     38.9        15.3      0.2  37021.0
m3        -30.2       7.7 -18540.7     38.1        11.2      0.1  37081.4
       se_looic
m3_cat     77.7
m3         76.2
m3  <- add_criterion(m3,        "waic")
m3_cat <- add_criterion(m3_cat, "waic")

loo_compare(m3, m3_cat, criterion = "waic") %>% 
  print(simplify = F)
       elpd_diff se_diff  elpd_waic se_elpd_waic p_waic   se_p_waic waic    
m3_cat      0.0       0.0 -18510.5      38.9         15.2      0.2   37021.0
m3        -30.2       7.7 -18540.7      38.1         11.2      0.1   37081.4
       se_waic 
m3_cat     77.7
m3         76.2